2 research outputs found

    Semantic feature reduction and hybrid feature selection for clustering of Arabic Web pages

    Get PDF
    In the literature, high-dimensional data reduces the efficiency of clustering algorithms. Clustering the Arabic text is challenging because semantics of the text involves deep semantic processing. To overcome the problems, the feature selection and reduction methods have become essential to select and identify the appropriate features in reducing high-dimensional space. There is a need to develop a suitable design for feature selection and reduction methods that would result in a more relevant, meaningful and reduced representation of the Arabic texts to ease the clustering process. The research developed three different methods for analyzing the features of the Arabic Web text. The first method is based on hybrid feature selection that selects the informative term representation within the Arabic Web pages. It incorporates three different feature selection methods known as Chi-square, Mutual Information and Term Frequency–Inverse Document Frequency to build a hybrid model. The second method is a latent document vectorization method used to represent the documents as the probability distribution in the vector space. It overcomes the problems of high-dimension by reducing the dimensional space. To extract the best features, two document vectorizer methods have been implemented, known as the Bayesian vectorizer and semantic vectorizer. The third method is an Arabic semantic feature analysis used to improve the capability of the Arabic Web analysis. It ensures a good design for the clustering method to optimize clustering ability when analysing these Web pages. This is done by overcoming the problems of term representation, semantic modeling and dimensional reduction. Different experiments were carried out with k-means clustering on two different data sets. The methods provided solutions to reduce high-dimensional data and identify the semantic features shared between similar Arabic Web pages that are grouped together in one cluster. These pages were clustered according to the semantic similarities between them whereby they have a small Davies–Bouldin index and high accuracy. This study contributed to research in clustering algorithm by developing three methods to identify the most relevant features of the Arabic Web pages

    Requirements analysis for SBS system and study review process iteration during requirements phase

    Get PDF
    This paper represented the experience gained and discussed the works done under the title “Requirements Analysis For SBS System And Study Review Process Iteration During Requirements Phase” by the author during her Industrial Attachment 2 Period, from 13th October 2008 to 13th March 2009, at HeiTech Padu, Malaysia. The purpose of this paper is to analyze requirements for Shared Banking Services system and study reviews process that can be applied in requirement phase in order to get quality requirements document with reduced errors. The SBS system is aimed to provide the banking services, of one selected bank, through post office in order to give the customer other alternative way to perform his or her banking services. There were some studies is carried out to understand how the reviews process is important in requirement phase and to show how to make the reviews process more effective by iterate it during the development of SRS document. The required methodology to achieved objectives of this paper began from initiation and planning, an analysis of Shared Banking Services system, study about best practices in requirements engineering process and study requirements review process. Finally, documentation of the output was performed. The development team of SBS system used ADVISE methodology which is based in HeiTech Padu process development. The deliverables of analyzing SBS system are Software Requirement Specification (SRS), Requirement Traceability Matrix and User Manual documents. A workflow is introduced to show how the reviews process can be iterated during development of SRS document
    corecore